2 1 N ov 2 01 4 Graph - Sparse LDA : A Topic Model with Structured
نویسندگان
چکیده
Originally designed to model text, topic modeling has become a powerful tool for uncovering latent structure in domains including medicine, finance, and vision. The goals for the model vary depending on the application: in some cases, the discovered topics may be used for prediction or some other downstream task. In other cases, the content of the topic itself may be of intrinsic scientific interest. Unfortunately, even using modern sparse techniques, the discovered topics are often difficult to interpret due to the high dimensionality of the underlying space. To improve topic interpretability, we introduce Graph-Sparse LDA, a hierarchical topic model that leverages knowledge of relationships between words (e.g., as encoded by an ontology). In our model, topics are summarized by a few latent concept-words from the underlying graph that explain the observed words. Graph-Sparse LDA recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.
منابع مشابه
Graph-Sparse LDA: A Topic Model with Structured Sparsity
Originally designed to model text, topic modeling has become a powerful tool for uncovering latent structure in domains including medicine, finance, and vision. The goals for the model vary depending on the application: in some cases, the discovered topics may be used for prediction or some other downstream task. In other cases, the content of the topic itself may be of intrinsic scientific int...
متن کاملA Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction
While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than othe...
متن کاملBayesian group factor analysis with structured sparsity
Latent factor models are the canonical statistical tool for exploratory analyses of lowdimensional linear structure for an observation matrix with p features across n samples. We develop a structured Bayesian group factor analysis model that extends the factor model to multiple coupled observation matrices; in the case of two observations, this reduces to a Bayesian model of canonical correlati...
متن کاملAccelerating Topic Model Training on a Single Machine
We present the design and implementation of GLDA, a library that utilizes the GPU (Graphics Processing Unit) to perform Gibbs sampling of Latent Dirichlet Allocation (LDA) on a single machine. LDA is an effective topic model used in many applications, e.g., classification, feature selection, and information retrieval. However, training an LDA model on large data sets takes hours, even days, due...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کامل